Picture for Ke Li

Ke Li

Jack

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon

NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

Add code
May 02, 2025
Viaarxiv icon

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

Add code
Mar 26, 2025
Viaarxiv icon

Optimal Parameter Adaptation for Safety-Critical Control via Safe Barrier Bayesian Optimization

Add code
Mar 25, 2025
Viaarxiv icon

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

Add code
Mar 14, 2025
Viaarxiv icon

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Add code
Mar 13, 2025
Viaarxiv icon

KiteRunner: Language-Driven Cooperative Local-Global Navigation Policy with UAV Mapping in Outdoor Environments

Add code
Mar 11, 2025
Viaarxiv icon

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon

Destroy and Repair Using Hyper Graphs for Routing

Add code
Feb 22, 2025
Viaarxiv icon

FlowAgent: Achieving Compliance and Flexibility for Workflow Agents

Add code
Feb 20, 2025
Viaarxiv icon